*** This code prepares the CSES data for analysis, including reshaping to respondent-party dyads	***
*** For Tuttnauer & Wegmann in APSR 																***
*** "Voting for Votes: Opposition Parties’ Legislative Activity and Electoral Outcomes"				***


use cses_imd.dta, clear

** Dropping unneeded variables
keep IMD1005 IMD1006_NAM IMD1008_YEAR IMD2001_1 IMD2002 IMD2003 ///
	 IMD3004_LH_PL IMD3008_* IMD3007_* IMD3006 IMD5000_*  IMD3004_LH_DC IMD5012_* IMD3002_OUTGOV

** Renaming variables 
rename IMD1005 id
rename IMD1006_NAM country
rename IMD1008 year
rename IMD2001_1 age
rename IMD2002 gender
rename IMD2003 education
rename IMD3004_LH_PL prev_vote
rename IMD3004_LH_DC dist_prev_vote
rename IMD3006 lr_self
rename IMD3007_* lr_party_*
rename IMD3008_* like_party_*
rename IMD5000_* party_id_*
rename IMD5012_* expert_lr_party_*
rename IMD3002_OUTGOV vote_gov

** Keeping only the needed countries	
keep if country=="Canada" | country=="Czech Republic" | country=="Finland" | country=="Germany" | ///
	country=="Israel" | country=="Poland" | country=="Sweden" | country=="Great Britain"

** Replacing the vote choice value with district vote choice, in district (instead of list) systems
replace prev_vote=dist_prev_vote if prev_vote==9999995

** Reshaping from one observation per respondent to one observation per respondent-party dyad
reshape long lr_party_ like_party_ expert_lr_party_, i(id) j(party) string

** Giving each respondent-party dyad the appropriate party ID
gen cses_id = party_id_A if party=="A"
replace cses_id = party_id_B if party=="B"
replace cses_id = party_id_C if party=="C"
replace cses_id = party_id_D if party=="D"
replace cses_id = party_id_E if party=="E"
replace cses_id = party_id_F if party=="F"
replace cses_id = party_id_G if party=="G"
replace cses_id = party_id_H if party=="H"
replace cses_id = party_id_I if party=="I"

label values cses_id IMD5000
drop party party_id_* dist_*

** Dropping cases with missing party ID
drop if cses_id==9999999

** Recoding missing/no-answer data to system-missing
recode age (9997/9999=.)
recode gender (9=.)
recode education (6/max=.)
recode prev_vote (9999993/max=.)
recode lr_self lr_party like_party (95/max=.)
recode expert (99=.)
recode vote_gov (9999997/9999999=.)

** Recoding gender so that 1 = male; 0 = female
recode gender (2=0), gen(male)
drop gender

** Cleaning variable names following the reshaping
rename lr_party_ lr_party
rename like_party_ like_party
rename expert_lr_party_ expert_lr_party

** Merge the datasets and remove un-needed observations

merge m:m cses_id year using "TW_conflict_data_for_cses.dta"

drop if country=="Canada" & year<2006
drop if country=="Israel" & (year<2006 | year>2013)
drop if country=="Spain" & year<2011
drop if country=="Czech Republic" & year<1998
drop if country=="Finland" & year==2015
drop if country=="Great Britain" & year==1997
drop if country=="Lithuania" & year==1997
drop if country=="Poland" & year==1997

** Adding missing vote-share for the Polish SLD in 2011
replace vote_share=.132 if party_id==629 & year==2011

* Generate cabinet left-right placement
gen cabinet_party=0 if _merge==3 /* Conflict data contains only opposition parties */
replace cabinet_party=1 if _merge==1 /* All non-merged parties in the master file are therefore cabinet parties */
bysort id: egen lr_cabinet = mean(lr_party) if cabinet_party==1
bysort id: egen t = max(lr_cabinet)
replace lr_cabinet = t
drop t cabinet_party


** Dropping un-needed merge outputs
drop if _merge!=3
drop _merge


** Generating variables of interest

gen perceived_dist = abs(lr_self - lr_party)
gen perceived_extreme = abs(lr_party - 5)
gen expert_extreme = abs(expert_lr_party - 5)
gen perceived_ideogap = abs(lr_party - lr_cabinet)
gen past_vote_for_party=1 if prev_vote==cses_id
recode past_vote (.=0)
encode country, gen(country_id)

* Interaction terms - computed to allow inclusion in coefplot
gen confXvote = conflict_rate*vote_share
gen confXideo = conflict_rate*ideo_gap

** Dropping cases with missing data
drop if like_party==. 
drop if perceived_dist==. 
drop if expert_extreme==.
drop if ideo_gap==.
drop if vote_gov==.
drop if age==. 
drop if male==. 
drop if education==.

** Generating respondent- and party-unique identifiers
egen party_unique = tag(cses_id year)
egen respondent_unique = tag(id)

** Dropping unneeded variables
drop prev_vote lr_self lr_party expert_lr_party lr_cabinet country year cses_id vote_gov

** Labeling variables
label var id "Respondent ID"
label var age "Respondent age"
label var education "Respondent education level"
label var like_party "Respondent's sympathy toward party"
label var male "Male (1 = yes; 0 = no)"
label var perceived_dist "Respondent's perceived distance from party on 0-10 left-right scale"
label var perceived_extreme "Perceived extremism of party"
label var expert_extreme "Extremism of party according to CSES experts"
label var perceived_ideogap "Perceived party-cabinet distance"
label var past_vote_for_party "Respondent voted for party in previous elections"
label var country_id "Country ID"
label var party_unique "Unique party indicator - once for each party in each survey round"
label var respondent_unique "Unique respondent indicator - once for each unique respondent"
label var confXvote "Interaction term - conflict rate * vote share"
label var confXideo "Interaction term - conflict rate * party-government distance"

save "TW_survey_data.dta", replace
